⚡️ Speed up function `_byte_to_line_index` by 39% in PR #1199 (`omni-java`) by codeflash-ai[bot] · Pull Request #1611 · codeflash-ai/codeflash

codeflash-ai · 2026-02-20T14:23:02Z

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.

📄 39% (0.39x) speedup for `_byte_to_line_index` in `codeflash/languages/java/instrumentation.py`

⏱️ Runtime : 924 microseconds → 663 microseconds (best of 163 runs)

📝 Explanation and details

The optimized code achieves a 39% runtime improvement through two key micro-optimizations that reduce per-call overhead in this frequently-executed helper function:

Primary Optimizations

Direct import binding: Changed from bisect.bisect_right() to importing bisect_right directly as _bisect_right. This eliminates the attribute lookup (bisect.) on every function call, saving ~90-100ns per invocation as shown in the line profiler (977304ns → 887516ns for the bisect line).
Conditional expression over max(): Replaced max(0, idx) with idx if idx > 0 else 0. This avoids the overhead of calling the built-in max() function with tuple packing/unpacking, reducing this line's execution time by ~40% (622046ns → 379545ns per the profiler).

Why This Matters

The function maps byte offsets to line indices using binary search, a core operation that happens 2,158 times in the profiled workload. These micro-optimizations compound significantly:

Test results show consistent 30-70% speedups across all cases, with the most dramatic improvements (60-70%) occurring in edge cases like empty lists or single elements where the overhead of max() represents a larger proportion of total execution time
Large-scale tests (1000-line files with multiple queries) still achieve 27-43% improvements, demonstrating the optimization scales well
The optimization is particularly effective for hot-path scenarios like sequential offset queries (42.6% faster) and dense line mapping operations

The changes preserve all behavior including edge case handling (negative indices, empty lists) while delivering substantial performance gains through elimination of unnecessary Python-level function call overhead.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 2158 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import pytest  # used for our unit tests
from codeflash.languages.java.instrumentation import _byte_to_line_index

def test_basic_mapping_at_and_between_starts():
    # Basic, small, easy-to-reason-about line byte starts
    starts = [0, 10, 20]  # three lines starting at bytes 0, 10, and 20

    # If the offset is exactly at a start, we expect that start's index.
    # bisect_right places the insertion point to the right of equal entries,
    # then function subtracts 1, yielding the index of that equal start.
    codeflash_output = _byte_to_line_index(0, starts) # 1.10μs -> 741ns (48.7% faster)
    codeflash_output = _byte_to_line_index(10, starts) # 611ns -> 431ns (41.8% faster)
    codeflash_output = _byte_to_line_index(20, starts) # 411ns -> 300ns (37.0% faster)

    # Offsets between starts should map to the previous start's index.
    codeflash_output = _byte_to_line_index(5, starts) # 440ns -> 271ns (62.4% faster)
    codeflash_output = _byte_to_line_index(15, starts) # 391ns -> 270ns (44.8% faster)

    # Offsets beyond the last start should map to the last index.
    codeflash_output = _byte_to_line_index(9999, starts) # 340ns -> 220ns (54.5% faster)

def test_edge_empty_list_and_negative_offsets_and_single_element():
    # If there are no starts, bisect_right returns 0 -> idx = -1 -> max(0, -1) => 0
    codeflash_output = _byte_to_line_index(0, []) # 912ns -> 541ns (68.6% faster)
    codeflash_output = _byte_to_line_index(123, []) # 461ns -> 281ns (64.1% faster)
    codeflash_output = _byte_to_line_index(-100, []) # 330ns -> 201ns (64.2% faster)

    # Negative offsets with non-empty starts also should clamp to 0
    starts = [10, 20, 30]
    codeflash_output = _byte_to_line_index(-1, starts) # 431ns -> 330ns (30.6% faster)

    # Single-element list: regardless of offset relative to the single start,
    # the algorithm always returns 0 because idx will be 0 or -1, and max keeps 0.
    single = [5]
    for off in (-10, 0, 4, 5, 6, 100):
        codeflash_output = _byte_to_line_index(off, single) # 2.12μs -> 1.34μs (58.3% faster)

def test_edge_duplicates_and_float_support():
    # Duplicate start entries: bisect_right will place insertion point to the right
    # of duplicates, so an offset equal to the duplicated start yields the last duplicate index.
    starts_with_duplicates = [0, 10, 10, 20]
    codeflash_output = _byte_to_line_index(0, starts_with_duplicates) # 1.07μs -> 671ns (59.8% faster)
    # At 10, there are two identical starts at indices 1 and 2.
    # bisect_right places insertion after them -> idx = insertion-1 = 2
    codeflash_output = _byte_to_line_index(10, starts_with_duplicates) # 581ns -> 401ns (44.9% faster)
    # Between 10 and 20 should map to the last 10's index (2)
    codeflash_output = _byte_to_line_index(11, starts_with_duplicates) # 391ns -> 260ns (50.4% faster)

    # Although the function is annotated for ints, bisect works with floats as well.
    # Verify correct behavior with float starts and float offsets.
    float_starts = [0.0, 2.5, 5.5]
    codeflash_output = _byte_to_line_index(0.0, float_starts) # 451ns -> 321ns (40.5% faster)
    codeflash_output = _byte_to_line_index(2.4, float_starts) # 401ns -> 251ns (59.8% faster)
    codeflash_output = _byte_to_line_index(2.5, float_starts) # 411ns -> 290ns (41.7% faster)
    codeflash_output = _byte_to_line_index(5.5, float_starts) # 391ns -> 260ns (50.4% faster)
    codeflash_output = _byte_to_line_index(6.0, float_starts) # 350ns -> 210ns (66.7% faster)

def test_large_scale_mapping_consistency():
    # Large-scale test with 1000 line starts spaced by 10 bytes each.
    n = 1000
    starts = [i * 10 for i in range(n)]  # deterministic, sorted starts [0,10,20,...,9990]

    # 1) Offsets that are exactly at start points should map to that start's index.
    #    We test many such offsets to ensure bisect_right behavior is preserved at scale.
    for i in range(0, n, 50):  # step by 50 to keep assertions informative while covering range
        offset = i * 10
        expected = i
        codeflash_output = _byte_to_line_index(offset, starts) # 10.0μs -> 7.37μs (36.0% faster)

    # 2) Offsets that are in the middle between two starts should map to the lower index.
    #    For each i, offset = i*10 + 5 lies between start[i] and start[i+1], so expect i.
    for i in range(0, n - 1, 50):
        offset = i * 10 + 5
        expected = i
        codeflash_output = _byte_to_line_index(offset, starts) # 9.24μs -> 6.68μs (38.3% faster)

    # 3) Offsets beyond the last start should map to the last index (n-1).
    codeflash_output = _byte_to_line_index(n * 10 + 12345, starts) # 441ns -> 330ns (33.6% faster)

    # 4) A comprehensive sweep (1000 checks) to ensure correctness across many offsets.
    #    This is a heavier check but still deterministic and bounded per the instructions.
    for i in range(n):
        # Pick an offset that is guaranteed to map to index i: choose i*10 (exact start)
        offset_exact = i * 10
        codeflash_output = _byte_to_line_index(offset_exact, starts) # 416μs -> 299μs (39.0% faster)

        # Also pick a midpoint between i and i+1 (except for last index).
        if i < n - 1:
            offset_mid = i * 10 + 7  # inside the range for index i
            codeflash_output = _byte_to_line_index(offset_mid, starts)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import bisect

# imports
import pytest
from codeflash.languages.java.instrumentation import _byte_to_line_index

def test_single_line_offset_at_start():
    """Test offset 0 with a single-line file returns index 0."""
    line_byte_starts = [0]
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 1.12μs -> 721ns (55.6% faster)

def test_single_line_offset_in_middle():
    """Test offset within a single line returns index 0."""
    line_byte_starts = [0]
    codeflash_output = _byte_to_line_index(5, line_byte_starts); result = codeflash_output # 991ns -> 621ns (59.6% faster)

def test_two_lines_offset_in_first_line():
    """Test offset in first line of multi-line file."""
    line_byte_starts = [0, 10]
    codeflash_output = _byte_to_line_index(5, line_byte_starts); result = codeflash_output # 971ns -> 601ns (61.6% faster)

def test_two_lines_offset_in_second_line():
    """Test offset in second line of multi-line file."""
    line_byte_starts = [0, 10]
    codeflash_output = _byte_to_line_index(15, line_byte_starts); result = codeflash_output # 912ns -> 651ns (40.1% faster)

def test_two_lines_offset_at_second_line_start():
    """Test offset exactly at the start of second line."""
    line_byte_starts = [0, 10]
    codeflash_output = _byte_to_line_index(10, line_byte_starts); result = codeflash_output # 942ns -> 651ns (44.7% faster)

def test_three_lines_offset_in_middle_line():
    """Test offset in middle line of three-line file."""
    line_byte_starts = [0, 10, 20]
    codeflash_output = _byte_to_line_index(15, line_byte_starts); result = codeflash_output # 942ns -> 651ns (44.7% faster)

def test_three_lines_offset_in_last_line():
    """Test offset in last line of three-line file."""
    line_byte_starts = [0, 10, 20]
    codeflash_output = _byte_to_line_index(25, line_byte_starts); result = codeflash_output # 902ns -> 621ns (45.2% faster)

def test_large_offset_values():
    """Test with large byte offset values."""
    line_byte_starts = [0, 1000, 2000, 3000]
    codeflash_output = _byte_to_line_index(2500, line_byte_starts); result = codeflash_output # 922ns -> 601ns (53.4% faster)

def test_many_lines():
    """Test with many lines (10 lines)."""
    line_byte_starts = [i * 100 for i in range(10)]
    codeflash_output = _byte_to_line_index(550, line_byte_starts); result = codeflash_output # 962ns -> 661ns (45.5% faster)

def test_empty_line_byte_starts():
    """Test with empty line_byte_starts list."""
    line_byte_starts = []
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 862ns -> 551ns (56.4% faster)

def test_empty_list_with_nonzero_offset():
    """Test with empty list and non-zero offset."""
    line_byte_starts = []
    codeflash_output = _byte_to_line_index(10, line_byte_starts); result = codeflash_output # 871ns -> 510ns (70.8% faster)

def test_offset_zero_with_multiple_lines():
    """Test offset 0 with multiple lines always returns 0."""
    line_byte_starts = [0, 5, 10, 15]
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 1.02μs -> 721ns (41.7% faster)

def test_offset_before_all_lines():
    """Test with offset before the first line start."""
    line_byte_starts = [10, 20, 30]
    codeflash_output = _byte_to_line_index(5, line_byte_starts); result = codeflash_output # 982ns -> 651ns (50.8% faster)

def test_offset_exactly_at_first_line_start():
    """Test offset exactly at first line start (non-zero)."""
    line_byte_starts = [10, 20, 30]
    codeflash_output = _byte_to_line_index(10, line_byte_starts); result = codeflash_output # 1.00μs -> 662ns (51.4% faster)

def test_offset_beyond_all_lines():
    """Test with offset far beyond the last line."""
    line_byte_starts = [0, 10, 20]
    codeflash_output = _byte_to_line_index(1000, line_byte_starts); result = codeflash_output # 902ns -> 661ns (36.5% faster)

def test_single_large_offset_value():
    """Test with a very large single offset value."""
    line_byte_starts = [0]
    codeflash_output = _byte_to_line_index(1000000, line_byte_starts); result = codeflash_output # 962ns -> 582ns (65.3% faster)

def test_lines_with_zero_starts():
    """Test with line_byte_starts containing only zeros."""
    line_byte_starts = [0, 0, 0]
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 972ns -> 681ns (42.7% faster)

def test_lines_with_consecutive_starts():
    """Test with consecutive line starts (single-character lines)."""
    line_byte_starts = [0, 1, 2, 3, 4, 5]
    codeflash_output = _byte_to_line_index(3, line_byte_starts); result = codeflash_output # 1.02μs -> 642ns (59.2% faster)

def test_negative_like_behavior():
    """Test behavior when bisect returns -1 (caught by max)."""
    line_byte_starts = [100]
    codeflash_output = _byte_to_line_index(50, line_byte_starts); result = codeflash_output # 902ns -> 591ns (52.6% faster)

def test_very_large_line_byte_starts():
    """Test with very large byte start values."""
    line_byte_starts = [0, 1000000, 2000000, 3000000]
    codeflash_output = _byte_to_line_index(2500000, line_byte_starts); result = codeflash_output # 922ns -> 601ns (53.4% faster)

def test_offset_at_exact_line_boundary():
    """Test multiple offsets at exact line boundaries."""
    line_byte_starts = [0, 100, 200, 300]
    # Offset at second line start should map to first line
    codeflash_output = _byte_to_line_index(100, line_byte_starts) # 972ns -> 671ns (44.9% faster)
    # Offset at third line start should map to second line
    codeflash_output = _byte_to_line_index(200, line_byte_starts) # 561ns -> 391ns (43.5% faster)
    # Offset at fourth line start should map to third line
    codeflash_output = _byte_to_line_index(300, line_byte_starts) # 420ns -> 300ns (40.0% faster)

def test_large_file_1000_lines():
    """Test with 1000 lines (large file scenario)."""
    # Create line starts for a file with 1000 lines, each 50 bytes
    line_byte_starts = [i * 50 for i in range(1000)]
    # Test offset in the middle of the file
    codeflash_output = _byte_to_line_index(25000, line_byte_starts); result = codeflash_output # 1.25μs -> 912ns (37.3% faster)

def test_large_file_offset_near_start():
    """Test large file with offset near the start."""
    line_byte_starts = [i * 50 for i in range(1000)]
    codeflash_output = _byte_to_line_index(100, line_byte_starts); result = codeflash_output # 1.02μs -> 762ns (34.0% faster)

def test_large_file_offset_near_end():
    """Test large file with offset near the end."""
    line_byte_starts = [i * 50 for i in range(1000)]
    codeflash_output = _byte_to_line_index(49900, line_byte_starts); result = codeflash_output # 1.11μs -> 872ns (27.5% faster)

def test_large_file_offset_at_very_end():
    """Test large file with very large offset (beyond file)."""
    line_byte_starts = [i * 50 for i in range(1000)]
    codeflash_output = _byte_to_line_index(100000, line_byte_starts); result = codeflash_output # 1.12μs -> 852ns (31.7% faster)

def test_many_queries_consistency():
    """Test multiple queries on same file for consistency."""
    line_byte_starts = [i * 100 for i in range(500)]
    offsets_to_test = [0, 50, 100, 500, 1000, 15000, 49900]
    expected_results = [0, 0, 0, 5, 10, 150, 499]
    
    for offset, expected in zip(offsets_to_test, expected_results):
        codeflash_output = _byte_to_line_index(offset, line_byte_starts); result = codeflash_output # 4.23μs -> 3.10μs (36.2% faster)

def test_sequential_offsets_monotonic():
    """Test that sequential offsets produce monotonically non-decreasing results."""
    line_byte_starts = [i * 50 for i in range(100)]
    previous_result = -1
    
    # Test offsets at regular intervals
    for offset in range(0, 5000, 100):
        codeflash_output = _byte_to_line_index(offset, line_byte_starts); result = codeflash_output # 20.6μs -> 14.4μs (42.6% faster)
        previous_result = result

def test_binary_search_efficiency():
    """Test that function handles large lists efficiently (via bisect)."""
    # Create a very large line list (1000 lines)
    line_byte_starts = [i * 100 for i in range(1000)]
    
    # Test multiple offsets to ensure bisect works correctly at scale
    test_cases = [
        (0, 0),
        (50, 0),
        (100, 0),
        (5000, 50),
        (50000, 500),
        (99900, 999),
    ]
    
    for offset, expected_line in test_cases:
        codeflash_output = _byte_to_line_index(offset, line_byte_starts); result = codeflash_output # 3.91μs -> 2.75μs (42.2% faster)

def test_dense_line_starts():
    """Test with very dense line starts (every byte)."""
    line_byte_starts = list(range(1000))
    codeflash_output = _byte_to_line_index(500, line_byte_starts); result = codeflash_output # 1.08μs -> 861ns (25.7% faster)

def test_sparse_line_starts():
    """Test with very sparse line starts (large gaps)."""
    line_byte_starts = [0, 10000, 20000, 30000, 40000, 50000]
    codeflash_output = _byte_to_line_index(25000, line_byte_starts); result = codeflash_output # 1.03μs -> 711ns (45.1% faster)

def test_mixed_sized_lines_large_scale():
    """Test with mixed line sizes at large scale."""
    # Create line starts with varying gaps
    line_byte_starts = [0]
    current = 0
    for i in range(500):
        # Alternate between 50 and 100 byte lines
        gap = 50 if i % 2 == 0 else 100
        current += gap
        line_byte_starts.append(current)
    
    codeflash_output = _byte_to_line_index(current // 2, line_byte_starts); result = codeflash_output # 1.05μs -> 701ns (50.1% faster)

def test_performance_large_offset_large_list():
    """Test performance with large offset and large line list."""
    # Create 1000-line file with 1000-byte lines
    line_byte_starts = [i * 1000 for i in range(1000)]
    
    # Test offset far into the file
    codeflash_output = _byte_to_line_index(900000, line_byte_starts); result = codeflash_output # 1.19μs -> 912ns (30.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-20T14.22.56 and push.

The optimized code achieves a **39% runtime improvement** through two key micro-optimizations that reduce per-call overhead in this frequently-executed helper function: ## Primary Optimizations 1. **Direct import binding**: Changed from `bisect.bisect_right()` to importing `bisect_right` directly as `_bisect_right`. This eliminates the attribute lookup (`bisect.`) on every function call, saving ~90-100ns per invocation as shown in the line profiler (977304ns → 887516ns for the bisect line). 2. **Conditional expression over max()**: Replaced `max(0, idx)` with `idx if idx > 0 else 0`. This avoids the overhead of calling the built-in `max()` function with tuple packing/unpacking, reducing this line's execution time by ~40% (622046ns → 379545ns per the profiler). ## Why This Matters The function maps byte offsets to line indices using binary search, a core operation that happens **2,158 times** in the profiled workload. These micro-optimizations compound significantly: - **Test results show consistent 30-70% speedups** across all cases, with the most dramatic improvements (60-70%) occurring in edge cases like empty lists or single elements where the overhead of `max()` represents a larger proportion of total execution time - **Large-scale tests** (1000-line files with multiple queries) still achieve 27-43% improvements, demonstrating the optimization scales well - The optimization is particularly effective for **hot-path scenarios** like sequential offset queries (42.6% faster) and dense line mapping operations The changes preserve all behavior including edge case handling (negative indices, empty lists) while delivering substantial performance gains through elimination of unnecessary Python-level function call overhead.

claude · 2026-02-20T14:35:24Z

PR Review Summary

Prek Checks

Fixed 3 issues (auto-fixed by ruff):

I001 unsorted-imports: from bisect import bisect_right moved to correct position
F401 unused-import: removed now-unused import bisect
FURB136 if-expr-min-max: reverted idx if idx > 0 else 0 back to max(0, idx) per linting rules

All prek checks now pass.

Mypy

19 pre-existing errors in instrumentation.py (missing type annotations, untyped functions). None introduced by this PR.

Code Review

No critical issues found. The PR makes a single micro-optimization:

Replaces bisect.bisect_right() with a direct import _bisect_right, eliminating attribute lookup overhead on each call.
The max(0, idx) optimization (idx if idx > 0 else 0) was reverted by the FURB136 linting rule, so the remaining speedup comes only from the direct import binding.
Logic and behavior are unchanged.

Test Coverage

File	PR Coverage	Base Coverage	Change
`codeflash/languages/java/instrumentation.py`	83%	83%	0%

Changed lines (import + _bisect_right call) are covered by existing tests
No coverage regression

Last updated: 2026-02-20

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 20, 2026

codeflash-ai bot mentioned this pull request Feb 20, 2026

codeflash-omni-java #1199

Draft

style: auto-fix linting issues

4fed31b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

⚡️ Speed up function `_byte_to_line_index` by 39% in PR #1199 (`omni-java`)#1611

⚡️ Speed up function `_byte_to_line_index` by 39% in PR #1199 (`omni-java`)#1611
codeflash-ai[bot] wants to merge 2 commits intoomni-javafrom
codeflash/optimize-pr1199-2026-02-20T14.22.56

codeflash-ai bot commented Feb 20, 2026

Uh oh!

claude bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Comments

Conversation

codeflash-ai bot commented Feb 20, 2026

⚡️ This pull request contains optimizations for PR #1199

📄 39% (0.39x) speedup for _byte_to_line_index in codeflash/languages/java/instrumentation.py

📝 Explanation and details

Primary Optimizations

Why This Matters

Uh oh!

claude bot commented Feb 20, 2026

PR Review Summary

Prek Checks

Mypy

Code Review

Test Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 39% (0.39x) speedup for `_byte_to_line_index` in `codeflash/languages/java/instrumentation.py`